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ABSTRACT 

This paper studies the complexity of evaluating functional 
query languages for complex values such as monad algebra 
and the recursion-free fragment of XQuery. 

We show that monad algebra with equality restricted to 
atomic values is complete for the class TA[2°^ n \0(n)] of 
problems solvable in linear exponential time with a linear 
number of alternations. The monotone fragment of monad 
algebra with atomic value equality but without negation is 
complete for nondeterministic exponential time. For monad 
algebra with deep equality, we establish TA\0(n )] lower 
and exponential-space upper bounds. 

Then we study a fragment of XQuery, Core XQuery, that 
seems to incorporate all the features of a query language on 
complex values that are traditionally deemed essential. A 
close connection between monad algebra on lists and Core 
XQuery (with “child” as the only axis) is exhibited, and it is 
shown that these languages are expressively equivalent up to 
representation issues. We show that Core XQuery is just as 
hard as monad algebra w.r.t. combined complexity, and that 
it is in TCo if the query is assumed fixed. 

1. INTRODUCTION 

Complex values form part of various data models for ad¬ 
vanced database applications, such as object-oriented, object- 
relational, and semistructured data models. A large amount 
of theoretical work on query languages for complex values 

has been done (e.g. |3511^ 171 13111371155113511351 151131 15511151 

and this has laid the foundations for object- 
oriented query languages as well as SQL 1999 or XQuery. 

Earlier complexity studies on query languages for complex 
values have almost entirely focused on logic- m and partic¬ 
ularly logic programming-based query languages rm [m i5). 
and fixpoint languages (e.g. EH)- However, the query lan¬ 
guages considered by many researchers to be most natural 
for complex values (such as complex value algebra without 
powerset mu, its syntactic variant monad algebra 1551151. 
and XQuery) are functional. 

Monad algebra. Monad algebra was shown expressively 


equivalent to a number of other important complex-value 
query languages such as nested relational algebra m and 
complex value algebra without powerset in previous research 
m- (Complex value algebra with powerset nomim can 
take hyperexponential runtime. Queries that really need the 
powerset operator are usually too costly to evaluate.) 

Since some of these languages were developed driven by 
practical requirements rather than from first principles as is 
the case for monad algebra, it appears that the expressive¬ 
ness of these languages on complex values is “the right one” 
to many researchers and plays a role analogous to that of 
the power of first-order logic (or relational algebra) on the 
relational model. 

One known result |38l is that monad algebra is in TCo 
w.r.t. data complexity (i.e., if the query is assumed fixed 
ESI). However, the complexity of monad algebra if the 
query is assumed variable (query/combined complexity 1401 1. 
is open. In this paper, we study the complexity of monad 
algebra under the latter assumption. 

XQuery. XQuery is destined to become the dominant 
data-transformation query language for XML data and to 
take a role analogous to the one occupied by SQL for rela¬ 
tional databases. 

It is folklore that full XQuery is Turing-complete, but it is 
also obvious that queries without recursion are guaranteed 
to terminate in straightforward functional implementations 
of the XQuery language. Recursion in XQuery is rarely used 
in practice (see also ED) ; recursive XML transformations 
are usually implemented in XSLT. 

In essence, XQuery is a quite natural typed functional 
programming language for XML; still it is sometimes crit¬ 
icized by the research community as huge and clumsy. In 
this paper we study a substantial recursion-free fragment of 
XQuery, which we call Core XQuery 1 . It seems that Core 
XQuery contains all and only the features one would expect 
from a functional query language for unranked trees in the 
spirit of complex-value algebra without powerset. 

Little foundational research on XQuery has been done to 
date. There are only some cautious first attempts at finding 
clean formalizations of and algebras for the language 
dl ED Most other recent work has focused on engineering 
good query processors for XQuery 1551157117511751 1351. 

In this paper, we attempt a first closer look at the com¬ 
plexity of XQuery, or more precisely, of the Core XQuery 


1 This fragment is not to be confused with the XQuery Core 
1441 . which is a much larger fragment of XQuery that is also 
harder to study. 




with negation | without negation 

deep equality 

in EXPSPACE; TA[2 c,(r ‘ ) , 0(n)]-hard 

equality on atomic values 

TA[2° (rl) , 0(n)]-complete 

NEXPTIME-complete 


Table 1: Summary of results on query/combined complexity for monad algebra and Core XQuery. 


fragment. We attempt to do this in a principled manner, 
establishing connections to earlier, well-studied formalisms 
for functional queries on complex-value databases mm® 
EsmuEa. Indeed our results on the complexity of monad 
algebra quite directly yield a characterization of the com¬ 
plexity of Core XQuery. 

The technical contributions of this paper are as follows. 

• We show that monad algebra on sets, lists and bags is 
complete for TA[2°^, 0(n)] w.r.t. combined complex¬ 
ity in the presence of negation and equality on atomic 
values. 

• We show that monad algebra on sets, lists and bags 
with equality on atomic values but without negation 
is NEXPTIME-complete. 

• For the case of monad algebra with deep equality, we 
obtain an EXPSPACE upper bound. A TA[2° (n) , O(n)] 
lower bound follows from the fact that negation is eas¬ 
ily definable using deep equality. 

• We introduce the Core XQuery language, a simple yet 
powerful nonrecursive fragment of XQuery. 

• We exhibit a close connection between XQuery and 
monad algebra on lists and show that the Core XQuery 
queries that use only the child axis for navigation in 
data trees capture monad algebra on lists up to repre¬ 
sentation issues. 2 

The established mappings are efficiently computable. 
This allows us to make use of our complexity charac¬ 
terization of monad algebra, but it also gives a very 
concise formal semantics to Core XQuery. 

• We show that if equality is restricted to atomic values, 
Core XQuery is complete for TA[2 0< ' rL \0(n)]. 

• Core XQuery with deep equality is in EXPSPACE and 
hard for the class TA[2° ( - 71 ), 0(n)\. Since we can again 
very directly express negation using deep equality, this 
result holds even if negation and universal quantifica¬ 
tion (“every”) are ruled out from the language. 

• We show that the monotone fragment of Core XQuery 
- without negation and with equality restricted to ato¬ 
mic values - is NEXPTIME-complete. 

• Finally, we show that Core XQuery is in TCo w.r.t. 
data complexity. 

Table 0 summarizes our complexity results for query and 
combined complexity. 

2 Monad algebra on nested lists (which are in fact the same 
as unranked, ordered, unlabeled trees) is uncomparable with 
Core XQuery in the strict sense because we have excluded 
position arithmetics and attributes from Core XQuery and 
thus cannot simulate the tuples of monad algebra that are 
essential to its expressiveness. 


To the best of the author’s knowledge, this is the first work 
characterizing the complexity of XQuery. The mappings to 
and from monad algebra also give an argument that Core 
XQuery is a well-designed language that offers the “right” 
degree of expressive power. 

Related work. It seems that the most relevant work 
regarding this problem - apart from the characterization of 
the data complexity of monad algebra in EH1 - is on the 
complexity of nonrecursive logic programming. 

For nonrecursive logic programming, a full complexity 
characterization |5j has been obtained for the most common 
forms of complex values (that is, values built from sets, lists, 
bags, tuples, and atomic values.) and various classes of logic 
programs (with and without negation, range-restriction, and 
types). It turns out that the complexity of nonrecursive 
logic programming is robustly (for various kinds of complex 
values, and with or without range-restriction) NEXPTIME- 
complete. In the presence of negation (and necessarily range- 
restriction), nonrecursive logic programming is known to 
be in the class TA[2 °^ n \0(n)] [ill and hard for the class 
TA[2°("/i° sn >,O(n/iogn)] Eld- 

A main difference between functional languages such as 
monad algebra and XQuery and logic programming as stud¬ 
ied in rruirni is the form of nonmonotonicity employed. In 
functional languages that have the power to check the equal¬ 
ity of complex values, negation is usually a redundant oper¬ 
ation. Equality introduces nonmonotonicity into the func¬ 
tional languages, while the seemingly same deep equality is 
innocuous in logic programming languages. Nonmonotonic¬ 
ity in the functional languages is different and seemingly 
more powerful than that obtained through negation in non¬ 
recursive normal logic programming. For example, in monad 
algebra, we can compute two sets of doubly exponential size 
in two different ways (using a “map” operation that applies 
a transformation to each element of a set) and then check 
their equality, while the upper bounds on the complexity 
of nonrecursive logic programming rely on the fact that uni¬ 
fiers in SLD resolution proofs of nonrecursive logic programs 
cannot grow beyond singly exponential size. 

The work EQEl] is on more expressive query languages. 
HU proves LDM logic without powerset complete for the 
class TA[2° < - n ' ) , 0{n)\. Differently from monad algebra, LDM 
logic is a logical language with quantification, operates on 
cyclic data, and cannot express deep equality. 

Structure. The structure of this paper is as follows. Sec¬ 
tion E| discusses some necessary complexity-theoretic back¬ 
ground and introduces complex values and monad algebra 
(on sets). SectionEJstudies the complexity of monad algebra 
on sets, focusing on upper bounds, while Section El provides 
the corresponding lower bounds. Section E] briefly discusses 
the complexity of monad algebra on lists and bags. SectionE] 
defines the Core XQuery fragment and provides efficiently 
computable mappings between monad algebra on lists and 
XQuery. Sections |7] and E] present our results on combined 
and data complexity of XQuery, respectively. 








2. PRELIMINARIES 

2.1 Complexity-Theoretic Background 

By ACo we refer to the class of languages recognizable by 
LOGSPACE-uniform families of circuits using and- and or- 
gates of unbounded fan-in of polynomial size and constant 
depth. By TCo we refer to the same class except that in 
addition so-called majority-gates are permitted, which com¬ 
pute “true” iff more than half of their inputs are true. For 
details on circuit complexity and the notion of uniformity 
we refer to mm\ 

We assume deterministic, nondeterministic, and alternat¬ 
ing Turing machines known and refer to e.g. |27] f° r defi¬ 
nitions. By DTIME[f(n)] and NTIME[t(n)], we denote the 
classes of all problems solvable in time t(n) (where n is the 
size of the input) on deterministic and nondeterministic Tur¬ 
ing machines, respectively. By DSPACE[s(n)], we denote 
the classes of all problems solvable in space s(n) on deter¬ 
ministic Turing machines. By TA[f(n), a(n)], we denote the 
class of problems solvable in time t(n) using a(n) alterna¬ 
tions on alternating Turing machines. 

We will use the following abbreviations for complexity 
classes in this paper: 


NETIME 

NEXPTIME 

2ETIME 

2EXPTIME 

LOGSPACE 

EXPSPACE 


NTIME[2 0( " } ] 

0 ( 1 ) 

NTIME[2 n ] 
DTIME[2 2 ° (n) ] 

DTIME[2 2 ] 

DSPACE[0(log n)] 
0 ( 1 ) 

DSPACE[2 n ] 


It is known that AC 0 C TC 0 C LOGSPACE C NEXPTIME 
C TA[2 n ° (1> , 1] C TA[2"° (1) ,n° (1) ] C TA[2"° (1) , 2 r ‘° <1> ] = 
EXPSPACE C 2EXPTIME. Moreover, 

NETIME C TA[2° (n) , 0{n)\ C 2ETIME C 2EXPTIME 
(cf. e.g. 27. ?1). 

NETIME and 2ETIME are not robust complexity classes 
- they are not closed under LOGSPACE-reductions, as can 
be verified using a simple padding argument and the Time 
Hierarchy theorem 1221 . We will consider completeness for 
those classes as well as of TA[2° < - n ' ) , O(n)] under LOGLIN- 
reductions, under which they are known to be closed (cf. e.g. 
|§]). By a LOGLIN reduction, we denote a LOGSPACE re¬ 
duction that produces output of linear size. TA[2 °( n \ 0(n )] 
is known to be closed under LOGLIN reductions and has im¬ 
portant complete problems from logic, such as deciding the 
Theory of Real Addition 151 HI. 


2.2 Complex Values and Monad Algebra 

We now introduce monad algebra on sets; monad algebra 
on lists and bags will be briefly sketched in Section |5] 

We consider complex values constructed from sets, tuples, 
and atomic values from a single-sorted domain 3 . Types are 
terms of the grammar 


r ::= Dom | {r} | (Ai : n,..., A k : r k ) 


where k > 0. 

3 All results in this paper immediately generalize to many- 
sorted domains. 


Consider the query language on complex values consisting 
of expressions built from the following operations (the types 
of the operations are provided as well): 

1. identity 

id : x i—> x t —> t 


2. composition 4 

fog-.x^ g(f(x)) 


r / / 

f :t-+t , g:r 


f o g : r -> r" 

3. constants from Dom U {0, ()} (() is the nullary tuple) 

4. singleton set construction 

sng : x i—> {»} r —> {r} 


5. application of a function to every member of a set 
map(/) : X {/(*) | x G X} 


f --T ->t’ 

map(/) : {r} -> {t 1 } 

6. flatten: X t —> |J X {{t}} —> {r} 

7. pairing 

pairwith Al : (Ai : Xi, Ai : X2,... , A„ : x„ ) i—> 

{(Ai : Xi, A 2 : x 2 , ■ ■., A n : x n ) \ Xi € Xi} 


(Ai : {n}, A 2 : T 2 ,. .. , A n : r n ) —> 

{{Ai : n,. .., A n : Tn)} 
(pairwithA; for i > 1 is defined analogously.) 

8. tuple formation 

(Ai : /i,...,A„ : /„} : 

x (Ai : fi{x), ...,A n : /„(*)) 

_ fi : r -> n,..., f n : r —» r n _ 

(Ai : /i,. . . , A n : fn) : r —*■ (Ai : n, . . . , A n : r n ) 

9. projection 

7TA^ • (Al . X± 1 . . . , Ai . Xi , . . . , A n . Xn) 1 * Xi 

7TAi : (Ai : n, ..., A n : T„) —> Ti 

The language has a nice theoretical foundation from pro¬ 
gramming language theory, that of structural recursion on 
sets extended by a small amount of machinery for creating 
and destroying tuples m- Formally, the language above 
is a Cartesian category with a strong monad on it (where 
“strong” refers to so-called tensorial strength introduced by 
the “pairwith” operation). We call this language monad al¬ 
gebra EH, or A4 for short. 

We will use flatmap(/) as a shortcut for map(/) o flatten. 
Observe that projection 7r is applied to tuples rather than to 
sets of tuples as in relational algebra. For example, the rela¬ 
tional algebra expression 7 tas corresponds to the expression 
map((A : 7 ta, B : tvb)) in M. 

4 Again, our convention throughout the paper is that (/ o 
9)(x) = g{f{x)), not f(g(x)). 







Example 2.1. The Cartesian product / x g can be de¬ 
fined as (1 : /, 2 : g) o pairwithj^ o flatmap(pairwith 2 ). 

Observe the difference from the product of relational al¬ 
gebra. For instance, the query id x id on a set of pairs S 
computes the set {{{*i, * 2 ), (* 3 , * 4 )) | (* 1 , xf), (* 3 , * 4 } £ S'} 
rather than {(* 1 , * 2 , * 3 , * 4 ) | (xi, * 2 }, (* 3 , x±) £ S}. □ 

It is customary to define Boolean queries (“predicates”) 
as queries that produce values of type {()}, i.e., that either 
return {()} (“true”) or 0 (“false”) [35| . Note that the logical 
conjunction 7 A 5 of two predicates 7 and S can be computed 
as 7 x 5. 

By positive monad algebra Mu, we denote M extended 
by the set union operation U. This language has a number 
of nice properties mm, but it is known to be incomplete 
as a practical query language because it cannot yet express 
an equality predicate 

(A; = Aj) : (Ai : Ti,. . A k : 77 ) —> {()}. 

However, if we extend Mu by any nonempty subset of the 
operations equality (A = B), testing set membership (A £ 
B) or containment (A C B), selection oa=b, set difference 

”, set intersection (~|, or nesting 5 , we always get the same 
expressive power. 6 We will call any one of these extended 
languages full monad algebra. 

Theorem 2.2 (EH). Mu{=\ = Mu[cr} = Alu[-] = 

Mu[C\] = Alu[C] = Mu[&\ = M u [nest]. 

Moreover, generalizing selections to test against constants 
or to support “£”, “C”, or Boolean combinations of con¬ 
ditions does not increase the expressiveness of full monad 
algebra EH- 

Example 2.3. Given a Boolean predicate 7 , selection 07 
can be expressed as flatmap((l : id, 2 : id o 7 ) o pairwith 2 o 
map(7n)). 

Predicate (A C B) can be expressed in _Mu[=] as 

(A : 7TA, a! : 7TA fl ttb) o (A = A') 

where fC\g:=(fx g)oai =2 omap( 7 ri). A predicate (/ C g) 
can of course be expressed as (1 : /, 2 : g) o (1 C 2 ). □ 

Example 2.4. Given a complex value of type 
(R : {t} , S : {t}), 

difference R — S can be implemented in M u [o] as 

pairwithjj o map((7? : ttr, Sr : (R : ^ tr, S : ivs)o 

pairwith s o ctr = s)) ° crs=0 0 map(7Ti{). 

The idea is to compute, for each element r of R, the set 
Sr of elements in S that are equal to r and then to select 
those elements r of R for which Sr is empty. □ 

5 The “nest” operation of complex value algebra with¬ 
out powerset [I] groups tuples by some of their at¬ 
tributes. For example, nest c=(b)(R) on relation R(AB) 
computes the value {(A: x, C: {{B: y) \ {A: x, B: y) £ R}) \ 
(3 y)(A-.x,B:y)€R}. 

6 No analogous statement can be made about flat relational 

algebra. 


Theorem l2.2l demonstrates that full monad algebra (w.l.o.g., 
A4u[=]) is a very robust notion. It can serve as an “expres¬ 
siveness benchmark” for query languages on complex-value 
databases. Indeed, it has been shown that full monad alge¬ 
bra is a conservative extension of relational algebra: 7 

Theorem 2.5 m ). A mapping from a (flat) relational 
database to a (flat) relation is expressible in Alu[=] if and 
only if it is expressible in relational algebra. 

There are a number of alternative ways of stating the 
query evaluation problem. In this paper, we study the com¬ 
plexity of Boolean queries. For XQuery, we will study the 
problem of deciding whether the root node (which must be 
always present in a valid XQuery result) of the resulting 
XML tree has children. 

In the following, we will discuss three kinds of complex¬ 
ity of query evaluation, data complexity (where queries are 
assumed to be fixed and data variable), query complexity 
(where the query is variable and the data is assumed to be 
fixed), and combined complexity (where both data and query 
are considered variable) t4(ll . 

3. COMPLEXITY OF MONAD ALGEBRA 

We will study the complexity of full monad algebra Mu [=] 
as well as monotone fragments. It is folklore that by ex¬ 
tending Mu by equality on atomic values = atomic, we still 
cannot express nonmonotone operations such as equality of 
sets or negation. We can safely generalize = a tomic to equal¬ 
ity of arbitrary complex values that do not include sets, 
=mon, defined inductively as = a tomic on atomic values and 
VI =mon wi A ■ • • A Vk = m on Wk on tuples (vi, ..., Vk) and 
(wi,... ,Wk). Of course this generalization does not improve 
upon the expressiveness of Mu[=atomic]- 

Proposition 3.1. A4u[—atomic] captures A4u[ = mon]- 

Proof Sketch. Of course, every AIu[=atomic] query is also 
a AIu [=mon] query. For the other direction, we can de¬ 
fine =mon using —atomic given the type t of the values to 
compare. Viewing each such tuple type as a ranked tree t, 
we simply define (A =mon B) as the conjunction (imple¬ 
mented as the Cartesian product) of the equality predicates 
(A.7r —atomic B,tt) for each attribute path n in t from the 
root to a leaf. For example, for type 

t=(C : {D : Dom, E : (F : Dorn, G : Dorn )},H : Dorn), 

(-A. =mon B) : = 

[A.C.D =atomic B.C.D) x ( A.C.E.F = a tomic B.C.E.F)x 
(. A.C.E.G =atomic B.C.E.G ) x ( A.H = atomic B.H). 

(By definition, these types must be constructed from tuples 
and atomic values.) □ 

We start our complexity study with data complexity. It 
is quite easy to conclude from Theorem 12.51 (conservativity 
over relational algebra) that the data complexity of AIu [o'] 
must be rather low. 

Proposition 3.2 (Folklore, EBI). Af u [=] is in TC 0 
w.r.t. data complexity. 

'A generalized version of Theorem 12.51 can be found in [35| . 




Since the proof in 1381 is somewhat involved, we provide 
an alternative direct proof in the appendix. 

We will now show that the query complexity of monad 
algebra is substantially higher. Actually, it is possible to 
write queries that compute values of doubly exponential size. 

Proposition 3.3. There is an Mu query Q that com- 
putes a value of size 2" 

Proof. Consider the query Q 

<^{ 0 , 1 } ° (id x id) o • • • o (id x id) 


all complexity classes we will consider for query and com¬ 
bined complexity throughout this paper will be closed under 
LOGLIN-reductions, combined complexity is no harder than 
query complexity. 

Proposition 3.6. There is a LOGLIN reduction that, 
given a complex value v , computes an Mu expression that 
evaluates to v on an arbitrary (e.g. empty) database. 

The main upper bound results of this paper follow next. 

Theorem 3.7. Mu[=atomic] is in NEXP TIME 
w.r.t. query complexity. 


m times 

where 4>{o,i} = (0 o sng) U (1 o sng) computes the set {0, 1} 
and m is linear in \Q\. Query Q computes the set of all 
nested pairs (=binary trees) of depth m with labels from 
(0,1} at the leaves. There are 2 2 such nested pairs. □ 

For the converse, 


Proposition 3.4. The values computable by Mu[=\ que¬ 
ries are of size 2 2 ( 5 , where n is the size of the input (i.e., 
database and query). 


Proof. Let C/(n), for each _Mu[=] expression /, be defined 
as follows: For constants, it is 0(1); for the operation id, it is 
|n|; for sng, it is \n\ + 0(1); for flatten, cr, and 7r, it is |n|, for 
pair construction (1 : /, 2 : g), it is 0/(n) + 0 s (n) + 0(l); for 
union fUg, it is Cf(n) + O g (n); for pairwith, it is n 2 + 0(l); 
and finally, for / o g, it is C g (Cf(n)). 

It is easy to see that Cf provides us with an upper bound 
on the size of the value obtained by applying Mu[=] expres¬ 
sion / on a value of size n. 

For n > 1, pairwith is the locally costliest operation, so 
let us assume that Q consists of the composition of \Q\ op¬ 
erations with this cost as an upper bound. In particular, 
this will provide an overestimation of the size of the com¬ 
puted value because for n > 1, C/(n) + C g {n) + 0(1) < 
^pairwith o ■ • ■ o pairwith( n ) = (''' (( n +0(1)) +0(1)) • • •) + 

x _ x 

I /1 +191 times 

0(1). Now, 


|[Q](o)| < 


< 


< 


Cq(\D\[ 


| Q | times 


(• ■ • ((|o | 2 + O(l )) 2 + 0 ( 1 )) • • • ) 2 + 0 ( 1 ) 

|Q| times 

(•■■(((|o| + o(|q|))yT~? 


< 2 2°(I d I+IQI) 


□ 


Corollary 3.5. Mu[=\ is in 2ETIME w.r.t. 
complexity. 


combined 


This is easy to see because given an input value of size 
2 2 ( , each operation of A4u[=] can be evaluated on the 

, O(n) 

input in time 2 on a random access machine. There 

oO(n) 2 O(n) 

are \Q\ < n operations, and \Q\ ■ 2 =2 

Since monad algebra has the power to construct arbi¬ 
trary values from scratch, we will use the following propo¬ 
sition and will subsequently focus on query complexity. As 


Proof Sketch. Without loss of generality, we may assume 
that all operations of the given monad algebra query are 
unary. This requires only a slight change of notation when 
we use the union operation U: Rather than writing / U g, 
we write (A : f,B : g) oil. 

The proof is by a LOGSPACE-reduction to the success 
problem of nonrecursive logic programming with function 
symbols (but without sets), i.e. the problem of deciding 
whether a distinguished boolean predicate evaluates to true. 
This problem is known to be NEXPTIME-complete [TUI . 

We now come to address the observation made in the 
introduction that while monad algebra queries may com¬ 
pute complex values of doubly exponential size (Proposi- 
tion l3.3l . resolution proofs for nonrecursive logic programms 
are always of only singly exponential size m■ We show 
that every Mu [=atomic] query can be reduced to a nonre¬ 
cursive logic program with a single binary function sym¬ 
bol /. For convenience, we write terms built using / in a 
contrived list representation (paths); for example, the term 
f{f{x,y),f(z,f(u,v))) will be written as (x.y).z.u.v. Left 
/-term children are considered Skolem functions generating 
new path labels. For example, (x.y).z.u.v is understood as 
a path w.z.u.v where w is a label generated from and iden¬ 
tified by x.y. 

We view every complex value as a deterministic tree, i.e., 
a tree in which each node v is uniquely identified by the 
path of labels from the root to v. We are able to uniquely 
assign such labels - even the elements of an index set to 
the elements of a set value, as we are considering query 
complexity and construct every set value from scratch (see 
Proposition Such a deterministic tree is of course fully 
described by the set of root-to-leaf paths occurring in it. 

We can now give an alternative semantics of Mu[=atomic] 
in terms of deterministic trees, that is, each query maps a 
deterministic tree given as a set of paths to a deterministic 


tree given as a set of paths, 

with 



[id](P) 

= V 



14(V) 

= W 



M(P) 

= {u | A.v £ V} 



[sng](P) 

= {s.u | v £ V} 



If °g](v) 

= M(I/]00) 



[flatten] (P) 

= {(*■/).« | i.j.v 

£ 


[A =atomic B](V) 

= {() 1 A.v,B.v 

£ 

n 

[7TA U 7T b]{V) : = 





{(li).v | A.i.v eF}U {(2 ,i).v \ B.i.v £ V} 
[(Ar :/*)]( V) := 

{Ai.vi,... ,A k .Vk | ui £ [/i](P) A ■ ■ • A Vk £ [/fc](P)} 







[map(/)](y) := 

{i.w \ 3u ■. i.u £ V Aw £ [/]({w | i.v £ F})} 

[pairwith A ,](V) := { i.Aj.v \ Aj.i.v eF}U 

{i.Ak.w | Aj.i.v, Ak.w £ V A j k} 

Here, V always denotes a set of paths, u, v, w, Vi,..., Vk 
denote paths, and i,j denotes indexes of set members. The 
symbol {) in the definition for the equality predicate is to be 
understood as a constant and a path of length one. Observe 
how the flatten operation merges two set member indexes 
i and j into one exploiting our binary function symbol for 
encoding paths. 

An example demonstrating the construction of 

{0, 1} o (id x id) = (1 : 0 o sng, 2 : 1 o sng) o Uo 
{A : id, B : id) o pairwith A o map(pairwith s ) o flatten 

is shown in FigureQ This query evaluates to a deterministic 
tree that can be uniquely specified by its set of root-to-leaf 
paths {((l.s).l.s).A.O, ((l.s).l.s).B.O,..., ((2.s).2.s).B.l}. 

The reduction of monad algebra queries to nonrecursive 
logic programming is now technical but not difficult. 

Our predicates are binary and of the form p(X,v), where 
X is a path prefix identifying a node w of the determinis¬ 
tic tree representation of our complex value, and v denotes 
one of the paths to leaves emanating from w, which taken 
together fully specify the complex value below node w. 

• We translate an expression map(/) on path X repre¬ 
sented by predicate [Q] into 

[Q; start_map](A.j, v) <— [Q](X, i.v). 

[Q; map(/)] (X, i.v) -t- [Q\ start_map; fj(X.i, v). 

plus the translation of / mapping from predicate 
[Q; start_map] to [Q; start_map; f]. 

That is, on a value identified by path prefix A', we 
move down to the set member children of X, the X.i. 
Then we apply / on the values X.i, and finally, we 
return to X. 

• We translate an expression {A\ : fi,... ,Ak ■ fk) on 
path X represented by predicate [Q] into 

lQ-{A 1 :f 1 ,...,A k :f k )i(X,A 1 .v) <- [Q;fi](X, v) 

IQ-,(Ai A k : f k )](X,A k .v) <- I Q-,fk\(X,v) 

plus, for each 1 < i < k, the translation of fi mapping 
from predicate [Q] to predicate 

• Compositions fog, are read as /; g and / and g are 
translated separately. The result predicate of / is used 
as the input predicate of g. 



(d): ((0 o sng) U (1 o sng)) o (A : id, B : id) o pairwith A 




(f): ((0 o sng) U (1 o sng)) o (A : id, B : id) o pairwith A o 
map(pairwith s ) o flatten 

Figure 1: Construction of tree {0, 1} o (id x id). 



• The remaining operations are translated as follows. 

[Q;c](X,c) 

<- 

[Q](A». 

IQ; pairwith s ](X, i.B.v) 


[Q](X,B.i.v). 

IQ; pairwith s ](X, i.A.v ) 


[Q](X, A.v), 
[Q](X, B.i.w). 

IQ; flatten] (A', (i.j).v) 

- 

[Q](X, i.j.v). 

IQ; (A = atomic B)i(X,s.(}) 

<- 

IQ] (AT, A.v), 
lQj(X,B.v). 

[Q; 7TA u 7rs](X, (li).v) 

«- 

[Q](X, A.i.v) 

\Q; TV A u 7rs](X, (2 .i).v) 

<- 

lQ](X,B.i.v) 

lQ;7r Ai j(X,v) 

- 

IQ](X, Ai.v) 

IQ; sng](A, s.v) 


[ Ql(x,v). 


By Proposition ^. 61 we may assume that our query ignores 
the input data; so we assume a predicate [e] and a fact 
[e] (e, dummy) <— . as part of our logic program. 

It is not hard to verify that this translation of a query 
Q in Mu [=atomic] into a nonrecursive logic program can be 
effected in LOGSPACE and that indeed the goal [Q](e, i.Q) 
is true iff Q evaluates to true. □ 


with goal pe ■ □ 

The reduction to nonrecursive logic programming of the 
proof of Theorem 13.71 can be rather easily extended to a re¬ 
duction from Mu [=atomic , not] to nonrecursive normal logic 
programming (that is, with negation). All we need to do is 
encode the operation “not” as 

[Q;not](A, s.()) t- setIQ](X), 

not nonempty[Q](A'), 
nonempty [Q] (X) <- \Q\{X,v). 

where the “set IQ]” predicates are defined alongside the [Q] 
predicates such that set[Q] (A') is true iff X is the path prefix 
of a set, empty or not. This reduction is not in LOGLIN 
because of the size of the predicates generated. Even if we 
replace the predicate names by shorter ones of the form p; 
(where i is an integer), they are of size log n each (where n is 
the size of the input query in monad algebra) and the overall 
size of the logic program is 0(n ■ logn). (There are linearly 
many rules.) But since we can compose this preparation 
with an ATM run and nonrecursive range-restricted normal 
logic programming is known to be in TA[2 °^ n \0(n)] EH, 
this shows that 

Corollary 3.10. Mu[= atomic, not] is in 
TA[2°( n l ° Brl \ 0(n ■ logn)] w.r.t. query complexity. 

We can improve this to 

Theorem 3.11. Mu[=atomic,not ] is in TA[2 0(n) ,0{n)] 
w.r.t. query complexity. 


We consider two examples to illustrate the construction 
of the logic programs. The save some space, however, we 
use short predicate names pi. 


Example 3.8. The logic program for the query (1 : 0 o 
sng, 2 : 1 o sng) o U is 


[e](e, dummy) 
pi(A,0) 

p 2 ( X,s.v) 

P3(X, 1) 
p 4 (X, S.v) 
Pb(X, l.v) 
ps(x,2.v) 
pe(A, (l.i).v) 
Pe{X, (2 ,i).v) 


[e](X>). 

Pi(X,v). 

[e](A,«). 

p 3 {X,v). 

p 2 {X,v). 

Pi(X,v). 

Ps(X, l.i.v). 

p 5 (x,2.i.v). 


# constant 0 

# sng 

# constant 1 

# sng 

# create_tuple 

# create_tuple 

# union 
jj= union 


The goal predicate p& computes the sets of paths of the 
deterministic tree representation of the result value, that is, 
(7T | pe(e,iv) is true} = {(l.s).O, (2.s).l} (see Figure 0(b)). 

□ 


Example 3.9. On values of type {(A : Dom, B : Dom)} 
represented by predicate Pi npu t , the query 

map((C : ka,D : n B ° sng)) 


is encoded as the logic program 


pi{X.i,v) 
P 2 {X,v) 
Ps(X,v) 
Pa{ A', s.v) 
ps( A', C.v) 
Ps(X,D.v) 
p 6 {X,i.v) 


Pinput (A , i.v) . 

pi(X,A.v). 

pi(X,B.v). 

p 3 (X,v). 

P 2 (X,v). 

Pa (A, v). 
p 5 (X.i,v). 


# begin_map 

# tta 

# 7TS 

# sng 

# create_tuple 

# create_tuple 

# endjnap 


Proof Sketch. The proof is direct, using alternating Tur¬ 
ing machines, but again incorporates the deterministic tree 
technique and the idea of evaluating a logic program, now 
with negation. We will sketch a fixed alternating Turing 
machine M that recognizes the Alu[=] queries that evaluate 
to true. Consider the proof of NEXPTIME-membership of 
nonrecursive logic programming without negation of )10l . It 
is by an argument that SLD resolution for such a program is 
in NEXPTIME because we can start from the goal and then 
always guess and verify unifiers until we have a proof that 
the goal is true. Unifiers are of singly exponential size; this 
is particularly easy to see for the special programs produced 
in the proof of Theorem l3.7l because there all predicates are 
over paths, and each path is of size 0(n ■ 2°^) = 2 0( - n \ 
where n is the size of the input query. (There are 0(n) 
steps in the paths and each step is a value of size 2°^ - no 
greater tuples can be computed by a 7Vdu[=] query of size 
n, and thus by our logic programs.) 

Our ATM M basically follows such a resolution strategy 
to prove that the query evaluates to true. We first compute 
the logic program of Theorem 13.71 f and its extension to sup¬ 
port negation described above) and write it to our worktape. 
Then we start proving the goal [Q](e, *.(}). Inductively, to 
prove a goal, for a given unifier, we guess a rule, adapt our 
unifier to the body atoms of the rule (both using existen¬ 
tial configurations of the ATM), and then branch out using 
universal ATM computation to check the body atoms of the 
rule in parallel. Whenever we encounter a negated atom in 
a rule body, we employ universal computation to verify that 
this atom cannot become true. Constructed appropriately, 
M of course accepts if and only if the goal [Q](e, *.()) is true, 
and thus iff our Mu\=\ query evaluates to true. 


Let us study the depth of the computation trees of M. 
The depth of the proof tree of the logic program is only 
linear in the size of the query. The paths in the computation 
tree of M are of length 2°^ because all we need to do is 
choose rules and unify very special terms (our deterministic 
tree paths) of size 2 0< - n K (One can verify by inspection of 
the construction of the logic program that this is feasible 
in linear time in the size of the paths.) The number of 
alternations used is bounded by the number of predicates 
in the program (There are 0(n) many because there are 
linearly many rules.) plus the number of negation symbols 
in the program, which is again 0(n). Thus, M is an ATM 
that runs in time 2°^ with 0(n) alternations, and our 
result is shown. □ 

Remark 3.12. The previous proof amounts in no way to 
a claim that we can close any gap we want here, the gap 
between TA[2° M , O(n)] and TA[2° (ri los,l) , 0(n ■ logn)] 
by just claiming that there is an appropriate Turing ma¬ 
chine that performs our reduction and then solves the prob¬ 
lem in the desired complexity class. But our proof shows 
that the predicate names introduced by our reduction from 
monad algebra to logic programming occupy space while not 
contributing to the power of the logic programs. This sug¬ 
gests that nonrecursive logic programming “wastes” some 
succinctness. This is also supported by the fact that, be¬ 
cause of the space blow-up caused by the predicates, there is 
a gap between the currently best known upper bound on the 
complexity of normal logic programming of TA[2°^ T1 ' ) , O(n)] 
pH) and the best lower bound of TA[2°( n / log ^, 0(n/ log n)] 

sun. □ 

Theorem 3.13. _Mu[=] is in EXPSPACE w.r.t. combined 
complexity. 

Proof Sketch. We lack the space to prove this, but a 
brief argument can be given. Since all values computable in 
_Mu[=] are of at most doubly exponential size (see Propo¬ 
sition B3 , we can represent an index of a set member (or 
even a path in the deterministic tree representation of com¬ 
plex values of the proof of Theorem m in a “register” of 
singly exponential size. We only use polynomially many (in 
the size of the query) such registers to evaluate the query 
using a strategy of recomputation of values on demand. 

For example, an operation map(/) on a subvalue identified 
by path prefix 7r can be executed by computing each of the 
path prefixes n.i, where i identifies an element of the set 
7r, (we can do this anytime we want because we know the 
query) and applying / to each of the n.i. 

Deep equality of values identified by path prefixes 7ri and 
7T2 can be checked by verifying for each value identified by 
path prefix iri.i whether there is an equal value identified 
by some path prefix n 2 -j, and vice-versa. Equality here in 
general is again deep, so we must employ this procedure 
recursively, but only up to the at most linear depth of the 
values; thus we only need linearly many exp-sized registers 
for checking deep equality. □ 

4. LOWER BOUNDS 

In this section we establish lower bounds matching the 
upper bounds of Theorems rm and rrm 

Theorem 4.1. M u [=atomic] is NEXPTIME-hard w.r.t. 
query complexity. 


Proof Sketch. The proof is by a LOGSPACE-reduction 
from NEXPTIME Turing machine acceptance. Let M = 
(Q M ,Qq 1 ,5 M ,F m ) be a nondeterministic Turing machine 
(NTM) that runs in time 2 n ( ' on inputs of size n. We 
simulate the computation of M in A4u [=atomic\- Each run 
of M is a sequence of configurations of length 2 K( ' n ' 1 , for a 
suitable k and K(n) = n k . (We may assume w.l.o.g. that 
terminating computation paths of M remain in a final state 
until time 2 K ^ by appropriate design of M.) Each config¬ 
uration of M consists of a read/write tape, a current state, 
and a position marker on the tape. Of course every 2 K( ' n ' > 
time NTM computation uses tape space bounded by 2 K ^ n ’. 

There are two main difficulties that we face in this reduc¬ 
tion: We have to (i) deal with Turing machine tapes and 
configurations of exponential size and have to (ii) model 
the accepting computations of M of exponential length suc¬ 
cinctly - the A4u [= atomic] query that must achieve this has 
to be computable in LOGSPACE and thus must be of poly¬ 
nomial size. 

Modeling configurations. 

• Each tape of a configuration is modeled as a tuple of 
arity 2 K ^ (or more precisely, nested pairs of nesting 
depth I\(n )) of tape symbols. 

Let E = {si,..., s c } be the (fixed) tape alphabet of 
M. Rather than representing the current position of 
the read/write head on the tape separately from the 
tape, we will assume a valid tape over extended tape 
alphabet E' = E U (>s< s £ E} to contain a single 
symbol >s< (with s € E) on the tape that indicates 
that this tape position stores symbol s and is the cur¬ 
rent position of the read/write head. 

We can compute the set of all (2 ■ c) 2 ( > such tapes 
in M\j as 

Tapes := o (id x id) o • • • o (id x id) 

K(n) times 

where <j) S i is an appropriate M u expression that com¬ 
putes S'. 8 

As a result of this construction, some elements of set 
Tapes do not correspond to valid Turing tapes because 
they contain either zero or more than two markers in¬ 
dicating the current position of the read/write head on 
the tape. We will deal with this later. 

• A superset of all possible configurations is 

Configs := (Tapes x Q M ) o map((t : 7Ti, q : 7^)). 

• The start configuration, consisting of the input tape, 
the start state, and the position marker at position 0 
of the tape is obtained as follows. 

We compute the start tape as the input x, with |x| = n, 
padded with (2 x ^ n - ) — n) ^-symbols (denoting unused 
tape space) and with the first position marked, but in 
our nested pairs representation. 

Let query <j> x define the nested pair of depth [log 2 n\ 
representing x padded by (2 ^ loS2 — n) ^-symbols, 
and with the first position marked. (This is easy to 

8 I.e., ■= siosngU- • •Us c osngU[>si<IosngU- • -Ul>s c <losng. 




compute in LOGSPACE.) For example, for input x = 
01101, the value computed 9 is 

«<> 0 «, 1 ),< 1 , 0 »,« 1 , #),<#,#)». 

The start tape is 

(j*start •— (l • 0CC) 2 ! (f)empty') ° typad, O • • • O (j)p a d 

K(n) — |"log 2 n] — 1 times 

with 


fipad. = (1 : id, 2 : (1 : 7 t 2 ,2 : 7r 2 >>, 


< frempty ■= # ° (id, id) o (id, id) o • • • o (id, id) . 

[log 2 n"l times 

This takes the value computed by fi x - which con¬ 
tains the input and some padding up to 2^ log271 ^ sym¬ 
bols, pairs it with a sequence of ^-symbols of the same 
length (computed by fi em pty ), and then iteratively dou¬ 
bles the length of the tape by appending two copies of 
the second half of the already computed tape (because 
the second half consists exclusively of ^-symbols). By 
this trick, there is a fixed expression fipad. that we can 
compose our query with to double the length of the 
value produced. 

The start configuration is 

Cstart ■— (t • fist-art, Q '• Qo )■ 


Observe that C s tart is a valid configuration with pre¬ 
cisely one tape head position marker. 

• The accepting configurations are those configurations 
in which the state is an element of the set F M = 
{fi ,..., /| F M|} of accepting states of M\ 


AcceptingConfigs := 

Configs o {o q=atomic fl U ■ • • U a q=atomic ). 


• In the following, we will test equality of nested pairs 
(tape segments) and configurations of exponential size. 
We can define an equality test = mon of linear size on 
tapes and tape segments using only = a tomic induc¬ 
tively as follows. On values of type Dom, = mo „ is 
--atomic- Otherwise, on pairs (1 : n,2 : r 2 ), 


(A =mon B) := ((7 T A ° fi) X (iVB ° fi)) ° 

ai T =atomi o 2 T ° a l V= mon 2.V 0 (id X id)o 

a i.l.T= atomic “l” °°2.1.T= atomic “2” ° ma p(0) 


where fi :m ((T : 1, V : 7Ti) osngU (T : 2, V : 7t 2 ) osng). 
For configurations C,C', 

(C —mon c') ^ ( c.t — mon C'.tAC.q — atomic C'-q). 


• Next we define an Alu [=atomic] expression fisucc that 
computes the pairs of configurations (C, C') such that 
C' is a possible successor of C, i.e., computable using 
the transition relation d M of M in one step. 

9 It may be advisable to have a special symbol indicating 
the left end of the tape on its leftmost position to help the 
machine avoid running out of bounds. We assume such a 
symbol part of the input, rather than of our construction. 
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Figure 2: Zooming into the tapes to find a valid tape 
change resulting from a computation step of M. 


Here the exponential size of the configurations is a 
problem; fi SUC c has to be chosen carefully in order not 
to be of exponential size. We achieve this as follows. 
We start with the Cartesian product of Configs - all 
pairs of configurations, even many that are invalid be¬ 
cause they have zero or more than two head markers. 
For each pair, we make working copies w,w' of the 
tapes. We achieve this by M u expression 

fiprepare — succ ■ ~ ConfigS O (id X id)o 

map((s : id, w : n c .t,w' : nc.t))- 

For w' to be a possible successor of w, the two tapes 
may differ at at most two consecutive tape positions 
(if the tape head moved, otherwise they may only 
differ at at most one position), and these positions 
must contain the read/write head position marker. We 
synchronously “zoom into” the working copies to find 
these two positions using the following three rules: 

1. If w.2 = w'.2 (i.e., the second halves of the tapes 
are equal), replace w by w.l and w' by w'.l. 

2. If w.l = w'.l (i.e., the first halves of the tapes are 
equal), replace w by w.2 and w' by w 1 .2. 

3. If in.1.1 = u/.l.l and w.2.2 = w' .2.2 (i.e., the first 
and last quarters of the tapes are equal) replace 
w by newly constructed pair (1 : w.l.2, 2 : w.2.1) 
and w' by (1 : w'.1.2,2 : w'.2.1) (that is, by the 
second and third quarters). 

All three cases may apply at the same time because 
the tapes of two valid configurations C,C' , where C' 
is a successor of C, can be equal. An example of it¬ 
erative zooming is shown in Figure [5] There, we look 
at a nested pair term of depth four (covering a tape of 
length 16) and the tape change occurs at positions 6 
and 7. We first zoom into the left (using Rule 1) and 
from there into the right half (using Rule 2). Now both 
halves differ, but the first and fourth quarter do not, 
so we can use Rule 3 to zoom down to the differing 
tape positions 6 and 7. In general, we obtain a tape 
sequence of length two by zooming into a tape (which 
is of length 2 K< - n ^) K(n) — 1 times. 

In our encoding in Mu[=atomic], we will compute the 
union of all triples (s, w, w') such that s is a pair of 
configurations with tapes t = uwv and t = uw'v and 
w and w' are of length 2 (i.e., w and w' are - if any 























- the only corresponding sequences in t,t' that differ). 
Now we have to make sure that w and w 1 contain the 
position marker. 


of Savitch’s theorem (cf. e.g. EH)- Let ipi the pairs of con¬ 
figurations (C, C') such that C is reachable from C' in 2 ' 
steps. We define ipi as 


This can be expressed in A4u [=atomic] as follows: 


(pwitness — succ •— (pprepare — succO 

(pzoom—in O • • • O (pzoom—in CXpmarke 


where 10 


K (n) — 1 times 


(pzoom—in •— 12d> 34< ° 7Tl2>34< U 

<R>12<134 ° rr>12<34 U <7l>23<4 ° 7Tl>23<4) 


(r 12t>34< 

• ^W. l = mon If/ . 1 


7T12>34< 

:= map((s : n s ,w : Kw. 2 ,w' : 

^w' . 2 ) ) 

C>12<34 

• ( ^w.2= rnori w' .2 


7r>12<34 

:= map((s : n s , w : 7Tw.i, w' : 

^w' .l)) 

crii>23<4 

• &w.l.l= rnon w / .1.1 O 0'w.2.2= 

mon w ' -2.2 


m>23<4 := map((s : 7r s , 

w : TT W o (1 : 7T1.2, 2 : 7T 2 .i), 
w' : Tt. w i o (1 : 7ri.2, 2 : 7t 2 .i))) 

and (pmarker selects those tuples for which w,w' are 
two tapes of length two that contain the read/write 
head marker: 


IpO ■— (psucc 

ipi +1 := ipi ° (id X id) o a 1 , C ’=2.c ° 

map((C : 7ri.c,C' : 7r 2 .c')) 

Note that the definition of ipi+i uses ipi only once, thus the 
formula remains computable in LOGSPACE. 

There is an accepting computation path of length 2 K ^ n ' 1 
iff there is a pair ( C , C ') in ipx(n) such that C = Cstart and 
the state of C' is in F M . We can phrase this as 


0 accept • — (((1 : Csta rt-, 2 : V ; K(n)) ° pairwith 2 o 

cr i=mon 2 .c ° map(7r 2 .c')) x Accepting Configs^ o 

mapQl — mon 2]) o flatten. 


(Again we only employ equality on configurations.) 

It is not difficult to see that the Mu[=atomic] query cp accept 
constructed is of polynomial size and can be computed in 
LOGSPACE. The entire problem is formulated as the query 
(e.g., the input x is constructed from constants and pairs) 
and (pacce.pt will not make use of an input value. Thus we 
have shown that A4u[=atomic] is NEXPTIME-hard with re¬ 
spect to query complexity (i.e., for a fixed database). □ 


It is not hard to verify that 


(pmarker (<Xw. l = a tomic> 3 l< U • • • U U w . l = otomic >s c <U 

a ic. 2 = atom ic>si< u • • • U a w . 2= atomio >s c <U). 

Now, for each (s : X,w : Y,w' : Z) G [< pwitaess-sv.ee ], 
either C = C' or Y , Z are precisely the at most two 
adjacent positions of the tapes of C and C ’ that can 
differ if C' is to be a successor of C. We can easily 
encode the valid successors with respect to transition 
relation 5m by a union of expressions that amount to 
selecting every pair of tapes that matches one of the 
transition rules encoded in <5. 

(Psucc •— (pwitness — succ O (u^ U * ' • U (J ^ m ) O map(7T s ) 

For instance, if (q',b,+ 1) G 5(q,a), one cr 7i is to se¬ 
lect the triples (s : (5 : {t : u> a <l sv, q : q), S' : (t : 
ubt>s<v, q : q')),w : oacs, w' : fo>s<l) G [(pwitness-succ]- 
(Details are omitted for lack of space, but it is impor¬ 
tant to note that the values that we are dealing with 
are atomic, so we only need equality on atomic values.) 

As for Configs, (p S ucc contains pairs (C,C') of invalid 
configurations. However, whenever C is a valid config¬ 
uration, C 1 is indeed a possible successor configuration 
on M. It follows by induction that, starting from valid 
configuration C start., we will only reach valid configu¬ 
rations via the successor relation (psucc■ 

Modeling computations. Now we are ready to model 
accepting computations of M. Here the problem is the pos¬ 
sibly exponential running time. We use a simple recursive 
divide-and-conquer approach in the spirit of the usual proof 

10 Here and later, , A m '■= ttaj o • • • o 7TA m . 


Lemma 4.2. For the 

construction of the proof of Theo- 

rem\4.1\ (a) \(p a cce V t\ 

= 

0{K{n)) 2 . (b) If =mon is available 

as a built-in, cj)accept 

can be defined such that \(p a cce P t\ = 

0(K(ri)). 

Proof Sketch. We can 

verify by inspection of the proof of 

Theorem 14.11 that 

\Configs\ 

= 

0(K(n)), 

\Cstart | 

= 

0(K(n)), 

| Accepting Configs \ 

= 

\Configs\+0(l) = 0(K(n)), 

| = mon | 

= 

0(K(n)), 

| (fiprepare — succ \ 

= 

0(\Configs\), 

| $ zoom —in \ 

= 

G(| —mon |), 

| (f witness — succ \ 

= 

0(\Configs\) 


+ 

O(\(pzoom — in\ ' A(u)) 


= 

0(A» 2 ), 

| f^succ | 

= 

|(pwitness — succ | T G(l) 


= 

0(A» 2 ), 

\^K(n)\ 

= 

|0sitcc| T 0(K(n) • | —mon |) 


= 

0(K(n) 2 ), 

| (paccept | 

= 

IC/tartl T |VW(n)| 


+ 

\AcceptingConfigs\ = 0(K(n) 2 ) 


From this it is also clear that (b) if we use built-in = morl 
operation rather than our defined monotone equality oper¬ 
ation, | (p accept | = 0[K(n)). □ 


Corollary 4.3. Mu[=mon] is NETIME-hard under 
LOGLIN-reductions (query complexity). 





Theorem 4.4. M u [= mon , not] is TA[2°^ n \0(n)]-hard un¬ 
der LOGLIN-reductions (query complexity). 

Proof Sketch. The proof is by a LOGLIN-reduction from 
TA[2° < ' n ' ) , 0(n)] Turing machine acceptance. Let 

M /y—v.A'f /-v M cM t-iM\ 

= (Qa , Qv , qo , 0 , F ) 

be an alternating Turing machine (ATM) that runs in time 
with 0 (n) alternations on inputs of size n. 

We simulate the computation of M in A4u [=mon, not]. 
Each run of M is a tree of configurations of depth 2 k ' n , for 
a suitable constant k. We may assume w.l.o.g. that termi¬ 
nating computation paths of M are no longer than 2 kn , i.e., 
the depth of the computation tree of M is bounded by 2 kn . 

By Lemma 14.21 if we use a built-in operation = mon , the 
sizes of all formulas of the proof of Theorem rm are linear 
in the size of K(n). Now, we fix K(n) = k • n, for some 
constant k. 

We use the formulae Cstart., Configs, AcceptingConfigs , and 
c (succ constructed as described. We define a modified ver¬ 
sion of ipk-n which computes the set of computation paths 
of length up to 2 kn (this can be realized by adding “stay 
transitions” (C, C ), for C £ Configs to fisucc ) and where the 
states of the intermediate configurations are all from Q g if 
the state of the first configuration is from Qj and are all 
from Qv otherwise. We can define this as 

ipi+ 1 := tpi ° (id X id) o a 1 . C '=2.C° 

<T l.C.qeQg f -»2.C.geQf f ° ma P((C : 7T1.C, c : 7r 2 .C'))- 

Now that we only need to consider tapes of size 2 k ' n , the 
monad algebra expressions only occupy space O(n), thus so 
far we have a LOGLIN reduction. 

Let the sets of configurations A, be inductively defined as 

Ai := {C | 3C' (C,C') £ ipk-n. A 

C' £ AcceptingConfigs A C.q £ Q 3 } 

Ai+i := (C | 3C' (C,C') £ ipk-n A 

C' £ (Configs - Ai) A C.q £ Q 3 C'.q <£ Q^} 

Clearly, C G Ai for odd i means that C.q G Qzf and that 
C is eventually accepting; C G Ai for even i means that 
C.q G Qy 1 and that C is not eventually accepting (both via 
i alternations and in 2 ! steps). 

W.l.o.g., we may assume that F M C Q™. By this as¬ 
sumption F m C Ai, and thus the final transitions leading 
to accepting states may be universal, rather than just exis¬ 
tential. We will now be somewhat sloppy and assume that 
the number of alternations K(n ) = 0(n) we ask for is al¬ 
ways odd. This is to keep the argument short, but a slight 
modification of the construction allows to eliminate the as¬ 
sumption. 

Then, M accepts its input precisely if Cstart. is eventually 
accepting with K(n) alternations, that is, iff Cstart G Ajc(„). 

It is not difficult to construct Ax{n) in monad algebra. We 
only remark that difference A — B on sets of nested tuples 
can be defined using =mon and “not” as 

{a £ A\flbb £ B f\a = mon b} 

or, in monad algebra on pair (1 : A, 2 : B), 

pairwith 1 oflatmap((a : 7ri,c : (a : : ^ 2 )opairwith s o 

flatmap(a =mon B) o not) o pairwith c o map(7r a )j 


Formula fiaccept is obviously of linear size, and thus the con¬ 
struction in LOGLIN. This concludes our proof. □ 

Considering again = a tomic as a built-in, our LOGSPACE- 
reduction of the proof of Theorem l4.1l for configurations and 
fisv.cc (with K(n) = n k ) in combination with the construc¬ 
tion for computations (Ai and fiaccept) of the proof of The¬ 
orem 14.41 yields 

Corollary 4.5. Mu[= atomic, not] is TA[2 n ° (1) ,n° (1) ]- 
hard under LOGSPACE-reductions (query complexity). 

We can give a more precise lower bound. 

Theorem 4.6. Atu[— — atomic, not] is TA[2 0 (n \0(n)]-hard 
under LOGLIN-reductions (query complexity). 

Proof Sketch. To allow for a query fiaccept of linear size 
overall, we have to rephrase both fiwitness-succ and ipK(n) to 
use our formula defining = m0 n via = a tomic only a constant 
number of times. We can do this now that we have negation 
and thus equality of a set of nested tuples available. We only 
sketch the idea here briefly, but it is the same for the two 
cases. Rather than testing equality linearly many times, we 
postpone the testing of equality on pairs of tuples until we 
have collected all the pairs in a set and we can test equality 
of them all at once. We demonstrate the idea for 1 pK(n)- Let 

i/Jo ■= fisvcc o map((l : id, 2 : 0)) 

'fii+i ’■= (id x id) o o‘ 1 .c. 9 gQM < ^ 2 .c.ijGQg f 0 
map((l : (C : ici.i.c, C : 7T 2 .i.c), 

2 : 7T1.2 U 7T2.2 U (1 : ■Kx.t.G', 2 : 7t 2 .i .c) ° sng)) 

Now we are interested in those pairs of configurations (c, c') 

s. t. (1 : (C : c, C' : c'), 2 : S) £ an d for all (1 : t, 2 : t') £ 

S, t =mon t!. We can define this as 

i>K(n) ■= ’fi , K(n) ° map((l : 7Ti , 2 : 7t 2 o all-equal)o 

pairwith 2 o map(7ri)) o flatten 

where all-equal := map((l =mon 2) o [not]) o flatten o not. 

For fiwitness-succ, we proceed analogously. We define 
fizoom-in to be a mapping from sets of tuples ( s : ( C , C'), w : 

t, w' : t',mbe : S) (where (s : (C,C'),w : t,w' : t') is as 
in the proof of Theorem rrn and S is a set of pairs yet 
to be checked to be equal - mbe is short for “must be 
equal”) to sets of tuples of the same type. We replace e.g. 

^12>34<l O 7Tl2>34< (ftzoom—in by 

map((s : 7 Ts,w : n w . 2 ,ui' : n w i. 2 , 

mbe : (w : n w .i,w' : ° sng U 

n-mbe ° map((ui : n w .i,w' : n w '.i) ° sng U 

(w ■■ n w . 2 ,w' : n w i .2 ° sng)) o flatten)) 

That is, in each such step we add the values that were 
checked to be equal using = mon in fizoom-in - here, for 
ui2>34< °7ri2i>34<, w.l and w 1 .1 and add them to mbe. Be¬ 
fore we do that, we split the pairs (t,t') of mbe into their 
immediate constituents (as shown in the bottom two lines of 
the monad algebra expression above). This is necessary to 
assure that all members of mbe are of the same type. How¬ 
ever, it has a nice side-effect. By this restructuring, after the 


last zoom-in step, the members of mbe are pairs of atomic 
values, and we actually do not need = mo „ here at all and 
can use = atomic instead. 

Note that we could not have used this construction in 
the proof of Theorem n~n because now we need negation to 
check that for each pair (t, t') in mbe, t = a tomic t 1 . (This 
can be done using the “all-equal” predicate defined above, 
with atomic replacing = mo „.) □ 

Since “not” is equivalent to (id = 0), 

Corollary 4.7. Mu[=] is TA[2°^ n \n]-hard under 
LOGLIN-reductions (query complexity). 

The queries constructed in our lower bound proofs are 
from flat relations to flat relations (we may assume this 
since we actually use no input data value). Since relational 
algebra is in PSPACE w.r.t. combined complexity (cf. e.g. 
0) and presumably PSPACE ^ NEXPTIME, it seems un¬ 
likely that there is even a PSPACE reduction from A4u[=] 
on flat relations to relational algebra in the spirit of the 
Conservativity Theorem of Paredaens and Van Gucht OH, 
Theorem 12 .. 'it . 

5. LISTS AND BAGS 

In this section, we study the complexity of monad algebra 
on lists M\j and bags ■ A formal definition of these 
languages is beyond the scope of this paper, but see OH El 
1301 for full formal definitions. We will use the same syn¬ 
tax and operation names as for monad algebra on sets, but 
now, for instance, U on lists means to append two lists and 
“flatten” appends the list-typed members of a list in order 
of appearance. For bags, these operations ignore order but 
preserve duplicates. Of course, two lists are equal iff they 
are of the same length and for each i, the i-th members of 
the two lists are equal. Two bags are equal iff each member 
of either bag occurs the same number of times in both bags. 

For bags, we will also consider the additional operations 
“monus” (a powerful version of difference which allows to ex¬ 
press arithmetics in monad algebra on bags) and “unique”, 
an operation that eliminates duplicates from bags. In 1321 it 
was shown that adding either of these two operations strictly 
increases the expressive power of the language (and adding 
both makes the language yet stronger). 

First we again look at data complexity. 

Proposition 5.1 (Folklore). M^\=, monus] and 
A4[j[=] are in TCo w.r.t. data complexity. 

There is no space to provide a proof for this, but “pars¬ 
ing” and accessing nested data is described in the proof of 
Proposition I3.2l and implementing the various operations of 
monad algebra is not difficult. (See also the similar proof 
that XQuery is in TCo - Theorem l8.3l . It is folklore that the 
majority gates of TCo circuits are powerful enough to sup¬ 
port the arithmetics required to implement bag operations 
such as bag difference. 

For a result that suggests that this is a good bound, 

Proposition 5.2 (nsi)- There are A4y^[=, monus] 
queries that are not in A Co- 

Regarding query/combined complexity, we can show that 


Proposition 5.3. The languages M^[—atomic, not] and 
Mu[=atomic,not] are TA[2°( n \0(ri)\-hard under LOGLIN- 
reductions (query complexity). 

Proposition 5.4. A4y [—atomic] and A4y[— atomic] are 
NEXPTIME-hard w.r.t. query complexity. 

The lower bound proofs for sets work without modifica¬ 
tions on lists and bags - we actually do not compare col¬ 
lections except in the definition of A, of the proof of Theo¬ 
rem where we compute differences. But here, we define 
difference R — S as a filter (using “map”) that computes 
those elements of R for which no element of S with the 
same value exists. For lists, this will preserve order of the 
elements in R and for bags it will preserve their multiplic¬ 
ities. For the correctness of our reduction, this does not 
matter, as long as we interpret nonempty collections of type 
[()] resp. {]()[} (possibly with duplicates) as truth and empty 
collections as falsity. 

For the upper bounds, 

Theorem 5.5. A4y [—atomic] and A4y [—atomic] are in 
NEXPTIME w.r.t. query complexity. 

Theorem 5.6. A4y (—atomic, not] and A4y (^atomic, not] 
are in TA[2°^ n \ 0(n)] under LOGLIN-reductions (query com¬ 
plexity). 

Here, the proofs of Theorems rm and rrm work without 
modifications for lists and bags. Actually, our encoding us¬ 
ing deterministic trees treats collections as lists, and thus 
preserves both order and multiplicities of members. If only 
equality on atomic values is available, however, we cannot 
distinguish between sets, lists, and bags in queries. Finally, 

Theorem 5.7. [=, monus, unique] and .M[][=] are 

in EXPSPACE w.r.t. query complexity. 

Proof Sketch. The proof is the same as for Theorem 13.131 
but now we also have to check deep list and bag equality in 
EXPSPACE. 

Consider the case of monus,unique]. When we 

want to check whether two bags identified by the path pre¬ 
fixes 7Ti and 7T2 are equal, we can do this by checking whether 
for each member of 7ri or 7T2, its multiplicity in 7ri is the 
same as in 7T2. We iteratively compute each root-to-leaf 
path 7Ti .i.v with prefix pi. For each such path, we write 7ri.i 
into an exp-size register. Now, we iterate over 7ri and count 
the number of 7 n.j equal to w\.i. Then we iterate over 7T2 
and count the number of 7T2 .j equal to 7Ti.i. If these two 
counts differ, the values 7ir and 712 are not equal. Now we 
repeat the same procedure for each root-to-leaf path 7T2 -i.v. 
The values m and TV 2 are equal if we have not discovered 
any differing counts. Equality of the values 7ri.i and 7T2.J is 
defined recursively, using the procedure just described, but 
this is not a problem since the depth of the values is only 
linear in the size of the query and we thus need only linearly 
many registers to check deep equality. □ 

In the remainder of this paper, we will assume that A4u 
has one further operation, “true”, which evaluates to [()] 
(true) on a list if it is nonempty and to [] (false) otherwise. 
We will use “true” to eliminate duplicate entries from truth 
values (i.e., from [(),..., ()]). It is easy to verify that this 


does not increase the complexity of Mf [=], Adfi [=atomic], or 
M" \—atomic-) not]. For .M [}[=], i 11 the proof of Theorem 15.71 
we only need a small number of exp-size registers to recom¬ 
pute the collection (we can stop early if we find at least one 
member). In the proofs of the upper bounds of the other two 
fragments, “true” is a non-operation because for proving a 
goal, duplicates do not matter. 

6. CORE XQUERY 

We consider the fragment of XQuery with abstract syntax 

query ::= (a/) \ (a)query(/a) \ query query 

var | var/ axis :: <p 
for var in query return query 
if cond then query 
(let var := query) query 
cond ::= var = var query 

where a denotes the XML tags, axis the XPath axes 11 , var 
a set of XQuery variables $£, $*i, $£ 2 , ■ ■ ■, $2/, $ 2 , • • •, and cp 
a node test (either a tag name or “*”). We refer to this 
fragment as Core XQuery, or XQ for short. 

For simplicity, we will work with pure node-labeled un¬ 
ranked ordered trees, and by atomic values, we will refer to 
leaves (or equivalently, their labels). 

XQuery supports several forms of equality. We will not 
try to use the same syntax (=, eq, or deep_equal) as in 
the current standards proposal - it is not clear whether the 
syntax has stabilized. Throughout this paper, equality is by 
value (that is, by value as a tree rather than by the yield of 
strings at leaf nodes of the tree). We will write =d eep and 
—atomic for deep and atomic equality, respectively. We will 
use = for statements that apply to both forms of equality. 

The semantics of XQ is given in Figure [3] As for Ady, 
U here denotes list concatenation; <d oc is the depth-first 
left-to-right traversal order through the tree, x* is the axis 
relation \ on tree t, labt is true on all nodes of t, and lab],, for 
a a tag name, is true on those nodes of t labeled a. All XQ 
queries evaluate to lists of nodes; however, we assume that 
XQ variables always bind to single nodes rather than lists; 
to assure this, we require that for expressions “(let := 

a) f3" , a is either of the form (a/) or (a) ao (/a)” (be., this 
gives a easy syntactic condition that a always evaluates to a 
singleton list). This semantics is (observationally) consistent 
with 1441 restricted to Core XQuery. 

In our definition of the syntax of Core XQuery, we have 
been economical with operators introduced. Since condi¬ 
tions are true iff they evaluate to a nonempty collection, 

<p or ip := (j> ip 

<j) and ip '■= if <p then ip 
some $x in a satisfies (p '■= f° r $x in a return <p 

Using deep equality, we can define negation, 

not (p := (((a){if <p then (b/)}(/a)) =dee P (a/)). 

11 For simplicity, particularly of the following semantics, we 
will only consider the child and the descendant axis; but all 
complexity upper bounds will hold for all XPath axes. Our 
theorems will refer to “all axes”, but proofs will assume that 
this notion refers just to the two axes child and descendant. 


IWJMe) 

== [(«/)] 

l(a)a(/a)] k (e) 

:= [<a)H(e)(/a)] 

{a 0jk(e) 

:= H(e) U [/I] (e) 

[for $£fc + i in a 
return f3\k(e) 

:= let l = [aj fc (e),n = Z|; 


return (J [/3] fc +i (e, U) 

l<i<n 

[(let $£fe+i := a) f3\k(e) 

■= lP\k+i{e,la]k{e)) 

[$£i]fc(tl, ...,t n ) 

■■= U 

[$£i/x ■■■4'ik(e) 

:= list of nodes v of tree t s.t. 

[$£i]fc(e) = [t] A 
x{roof ,v) A lab|,(u) 
in order <d oc 

[if 7 then a\ k {e) 

'■= if [ 7 }k(e) then [a]jc(e) else [] 

[$£; = $£j]/b(ei,..., efc) 

:= if ei — ej then ]{yes/)] else [] 


Figure 3: Semantics of Core XQuery. 

Conditions “every $x in a satisfies cp” can be defined using 
“not” and “some”. We can even assume that expressions of 
the form var/path are supported, where path is any expres¬ 
sion in navigational XPath (aka. Core XPath Util 1171 1. 

It is clear that 

Proposition 6.1. Let X be a set of operations and axes. 
Then, each XQ[=dee P , not, every, X] query can be translated 
in LOGLIN into an equivalent XQ[=d eep , X] query and each 
X'Q[and, or, some, X] query can be translated in LOGLIN 
into an equivalent XQ\X\ query. 

Next, we provide mappings between Core XQuery (using 
only the child axis) and monad algebra on lists. These show 
the equivalence of these languages up to representation is¬ 
sues, but our main aim is to provide reductions for the study 
of the complexity of XQuery. 

Translation from Core XQuery to Ml] 

We recursively map the data tree T to a complex value 
C(T) as follows: Each tree node with label t and children 
Vi,... ,v n is mapped to a tuple 

(label : t, children : {C(vi),..., C(u„)}). 

Modulo representation issues captured in the tree transla¬ 
tion function C, there is an equivalent monad algebra query 
for each XQ\=, child] query, for = either =dec P or = a tomic. 

Theorem 6.2. There is a mapping 

MA : XQ\=, child, not] —> A4y[=, not] 

such that for each XQ[=, child, not] query Q, 

1. for any XML document tree T, 

[C(Q(T))] = MA(Q){{(N : %ROOT,V: [C(T)])», 

2. MA(Q) can be computed in space 0(log |Q|), and 

3. \MA(Q)\=0(]Q]). 




MA : XQ\=, child, not] —> {{N : varname, V : r)} —> f 


MA(a 0) 
MA((a/)) 
MA({a)a(/a)) 
MA(%Xi) 
MA($Xi/t) 
MA( for $* in a return 0) 
MA((let $* := a) 0 ) 
Mj4(if a then 0) 


MA(a) U MA(0) 

(label : a, children : []} o sng 
(label : a, children : MA(a)) o sng 
TTi 

7Vi O flatniap(7r c hildren ° CJ label=t) 

(1 : id, 2 : MA(a)) o pairwith 2 o flatmap((7ri U ({N : 8x, V : n 2 ) o sng)) o MA(0)) 
(1 : id, 2 : MA(a)) o pairwith 2 o flatmap((7ri U ({N : 8x, V : 712 ) o sng)) o MA(0)) 
(1 : id, 2 : MA(a) o true) o pairwith 2 o flatmap(7ri o MA(0)) 


MA {not a) 
MA($x = $y) 


MA(a) o map(()) o not 

(1 : a N=Sx , 2 : a N=Sy ) o pairwitlq o flatmap(pairwith 2 ) o ai.v= 2 .v 


Figure 4: Mapping from XQ[=, child, not] to Mu[=, not]. 


Proof Sketch. It is easy to verify that the function MA of 
Figure 2] satisfies conditions (1) to (3) of our theorem. □ 

For atomic equality, ai.v= 2 .v in the definition of MA is 
to be implemented as o- 1 . v . label = otomi<:2 .v. label - Note that 

on a XQ[=, child] query Q , MA(Q) is a AlJ[=] query. 

Translation from Mu to Core XQuery 

Let T be the following canonical translation from complex 
values to trees: 

T({Ai : vi,A 2 : v 2 )) = 

(tup)(ai)T(vi)(/ai)(a 2 }T(v 2 )(/a 2 )(/tup) 

T({v 1 ,..., »„}) = (list)T(v 1 )... T(y n ){/list ) 

Note that T is not the inverse of the mapping C that we 
introduced above to map from XML trees to complex values 
constructed from tuples and lists. 

Let My’*■ ’ 1 denote the monad algebra queries on lists and 
pairs (rather than on tuples of arbitrary arity). For both 
—deep and = a tomic, we have 

Theorem 6.3. There is a mapping 

XQ : Af[i[=] ->■ XQ[=, child\ 
such that for each _Mu[=] query Q, 

1 . for any complex value v, 

T(Q(v)) = XQ(Q)($ROOT)(T(v)), 

2. XQ(Q) can be computed in space 0(log |Q|), and 

3. If Q is a Mu (v) query, \XQ{Q)\ = 0{\Q\). 

Proof Sketch. It is easy to verify that the mapping of 
Figure 0 satisfies (1) to (3). □ 

7. COMBINED COMPLEXITY OF XQ 

Theorems IQ and IQ provide LOGLIN-reductions back 
and forth between queries in monad algebra on lists and 
XQ. Note that each XML document is an XQuery. We 


can compose query and data into a single query (certainly 
in LOGLIN). All the complexity results of this section in¬ 
volve complexity classes that are closed under LOGLIN- 
reductions. Thus combined complexity for Core XQuery 
with composition will never be harder than query complex¬ 
ity, and we only need to study the latter. 

Corollary 7.1. With respect to query complexity, 

• XQ[=dee. P , child] is TA[2 0 ( - n \0(n)]-hard under 
LOGLIN-reductions and in EXPSPACE; 

• XQ[=atomic, child, not] is 

TA]f2 0 ^ n \0(n)]-complete under LOGLIN-reductions; 

• XQ[ — atomic 5 child] is NEXPTIME-complete. 

Proof. This follows immediately from the results of Sec- 
tion|5]on Mj and, for the upper bounds, the LOGSPACE- 
reduction from XQ[=, child] to Al[i[=] of Theorem l6.2l resp.. 
for the lower bounds, the LOGLIN-reduction from My[=] 
to XQ[=, child] of Theorem 16.31 □ 

The complexity classes in which our XQuery evaluation 
problems reside are large enough that minor extensions, such 
as supporting all XPath axes, do not matter. 

Theorem 7.2. W.r.t. combined complexity, 

• XQ[=dee P , all axes] is in EXPSPACE, 

• XQ[= a tomic, all axes, not] is in 7!A[2°^, 0(n)], and 

• XQ[ — atomic 5 all axes] is in NEXPTIME. 

A proof of this is not difficult but is beyond the scope 
of this paper and will be given in its long version. For the 
EXPSPACE bound, for instance, the proof idea is that no 
value computable by an XQ query is greater than of doubly 
exponential size in the size of the input (see also Proposi¬ 
tion IQ for the analogous fact for monad algebra on lists). 
Thus we can use “addresses” of singly exponential size to 
refer to subtrees of an intermediate result. While we can¬ 
not store the subtree as a whole, we can always recompute 
it from these addresses and the query. There are only lin¬ 
early many variables in an XQuery, so we need only linearly 




XQ((A! A k :f k ))($x) 

XQ(tu)($x) 
XQ(sng)($x) 
XQ(f o g)($x) 
XQ(map(f))($x) 
XQ(id)($x) 
XQ (flatten) ($x) 
XQ (pairwith i ) ($*) 


XQ(fUg)($x) 

XQ(aA i =A j )($x) 

XQ(c)($x) 


T({Ai : XQ(/i)(fcr), ...,A k : XQ(f k )($x))) 

{%x/a,i/*} 

(list) {$x}(/ list) 

(let $y := XQ(f)($x)) XQ{g){$y) 

(Ust){ioT $y in %x/* return XQ(f)(%y)}(/list) 

$x 

(list) {$x/list/*} (/list) 

{Ust){ioT $y in $x/oi/list/* return (tup) 

(ai){$x/ai/*}(/ai)... (ai-\){%x/a i - 1 /*}(/a i - 1 ) 
(ai){%y}(/a,i) 

(a i+ i){%x/a i+ i/*}(/a i+ i_)... (a k ){%x/a k /*}(/a k ) 
(/tup)} (/list) 

(list){(XQ(f)(%x))/*}{(XQ(g)(%x))/*}(/list) 

{ for $y in $x/* return { if (%x/ai/* = %x/a,j/*) then $x } } 


Figure 5: Mapping from to XQ[=deep, child]. 


many exp-size registers to evaluate the query. This yields 
an EXPSPACE algorithm. 

Note that this EXPSPACE algorithm is quite robust and 
allows to add a number of XQuery features that we ex¬ 
cluded from XQ, such as counting, (document position) 
arithmetics, and duplicate elimination. 

8. DATA COMPLEXITY OF XQ 

By Proposition 15.II monad algebra is in LOGSPACE with 
respect to data complexity. Since LOGSPACE is closed un¬ 
der composition (cf. EH) and the mapping C is clearly in 
LOGSPACE, 

Corollary 8.1. XQ[= deep , child} is in LOGSPACE w.r.t. 
data complexity. 

We can improve on this result. 

The data complexity of XQuery is so low that we need to 
be careful about how the XML data is represented. We dis¬ 
tinguish the cases of representation by a DOM tree (i.e., a 
pointer structure) and representation by a string (an XML 
document). As we show, the complexity is (presumably) 
slightly lower on strings than on trees, even though the for¬ 
mer require parsing the input. It turns out that the com¬ 
plexity bounds are precisely the same as for XPath [mom. 

Theorem 8.2. XQ[=dee P , all axes ] is LOGSPACE-com- 
plete under NC\-reductions (data complexity) if the input 
is given as a DOM tree. 

Proof Sketch. Let us assume that for a given XQuery, 
already the result of all its subqueries are given as input. 
Then we can evaluate the query in LOGSPACE because all 
the space we need is a fixed number of log-sized registers 
for the variables of the query. (We only need a logarithmic 
number of bits to store a node id of the input). 

A fixed query consist of a fixed number of compositions. 
Since LOGSPACE is closed under compositions (cf. EH) , we 
can compose the algorithm just discussed for the individual 
subqueries into a single LOGSPACE algorithm that intu¬ 
itively computes the query by precomputing its subqueries 
bottom-up (w.r.t. the syntax tree of the query). 


LOGSPACE-hardness follows from the fact that directed 
tree reachability is LOGSPACE-complete under NCi-reduc- 
tions j5J and directed tree reachability (checking whether 
node w is reachable from node v in tree t) can be easily 
encoded by mapping f to a XML tree in which only v has 
label “v” and only w has label “w”. Then the query /de- 
scendant::v/descendant::w tests reachability of w from v. □ 


Theorem 8.3. XQ[=d ee p, all axes ] is in TCo w.r.t. data 
complexity if the XML input is given as a character string. 


Proof Sketch. We show a stronger result, that every Core 
XQuery expression can be encoded as a TCO reduction that 
transforms the input data into the query result. 

By FOM, we denote first-order logic extended with ma¬ 
jority quantifiers M (Ij. A formula My <j>(x,y) is true if 
4 >(x, y) is true for more than half of the positions y of the 
input. It is known that TCo is equivalent to the class of 
languages recognizable using FOM sentences jT. 

The reduction is encoded in FOM. A FOM reduction [1. 
is a set of FOM formulae, consisting of a formula “size” s.t. 
size(s) iff the size of the string is s and a formula pos a for 
each a £ X s.t. pos a (i) iff the i-th symbol of the string is “a”. 
It is known 3] that FOM can express predicates x = y + z 
and x = #y<l>(y), such that x is the number of positions y 
for which 4>(y) holds. We use E{j/ | cj>(x,y)} as a shortcut 
for f/u 3 y : (j>(x, y) A 1 < u < y. 

We will assume the document to be encoded using an al¬ 
phabet of opening and matching closing tags. For the sake 
of simplicity of this proof, but without loss of generality, we 
will assume that base values (e.g. strings) are encoded as 
trees. The input will be a well-formed sequence of opening 
and closing tags. We will identify nodes by the position of 
their opening tag in the (input) string. The input is repre¬ 
sented using a predicate size[$POOT] s.t. size[$7?.OOT](n) 
iff n is the size of the input and predicates pos a [$I?.OOT] 
s.t. pos a [$7?.OOT](i) iff the i-th symbol of the input is “a”. 




size[(o/)] fc (e, s) :<=> 2 

POS([(a/)] fc (e,i) :•$» (i = 1 => l = (a)) A (i «2 => l = {/a)) 

size[(a)a(/a)]fc(e, s) :-*=>■ 3s' : size[a]fc(e, s') A s = 2 + s' 

pos; \{a)a{/ a}]*;(e, i) :«=*> 3s : size[(a)a(/a)]fc(e, s) A 

(i = 1 => l = (a)) A 

(l < i < s => pos ; [a]fc (e,i — 1)) A 

(i = s => l = (/a)) 

size [a /3]fc(e, s) 3 si3s 2 : s = si + S 2 A size[a]fc (e, si) A size[/3]fc (e, S 2 ) 

posjcc/^^e, i) 3s : size[a] fc (e, s) A 

(*<«=> pos ; [a]fc(e, *)) A 

(* > s =>■ 3i' : 1 = * — s A pos;[/3]fe(e,«')) 

size[(let $*fc+i := a)/3]fc(e, s) size[/3]fc+i (e, 1, s) 

P os i[(let $Xfc+i := a) fi\k(e,i) pos ; [/3] fc+ i(e, 1, i) 

size[for $*fc+i in a return /3]fc(e, s) :-*=>■ s = £{s' | 3j : itemfajt; (e, j) A size[/3]fc + i (e, j, s')} 

P° s ( [for $Xfc+i in a return /3] k (e, i) 3s3j 0 : s = £{s' | 3j : j < j 0 A itemfajfc (e, j) A size[/3] fc+ i(e, j, s')} A 

3s' : item[a]fc(e, s + 1) A size[3 ]^+1 (e, s + 1, s')} A 
s < * < s + s' A pos;I/3]fc + i(e, s + l,i — s) 

s\ze\%Xi/x ■■■ a]fc(e, s) s = £{/ - j + 1 axis x [$Xi] fc (e, 1, j) A node[$Xi] fc (e, j, j')} 

pos 0 [$Xi/x :: a] fc (e, i) :<t» 3s3j 0 : s = £{j' - j + 1 | 3j : j < j 0 A 

axis x [$a;i]fc(e, 1, j) A node[$Xi]fc(e, j, j')} A 
3s'3jo : axis x [$Xi] fe (e, 1, jo) A node[$Xi]fc(e, jo, jo)} A s' = j' 0 - jo + 1 A 
s < i < s + s' A pos;[$*;]*;(e, i — s + jo - 1) 

size[$x;]fc(a:i, ... ,x k ,s) 3j : node[expr($Xi)]];_i (x\, .... x, i : x i: j) A 

s = j - Xi + 1 

pos ; [$Xi]fc(xi,... ,Xk, i) :&• pos ; [expr($Xi)]i_i(xi,... ,Xi-i,Xi + i - 1) 

size[$root]fc (e, s) size(s) 

pos ; [$root]fc(e, i) :-*=*> pos ; (j) 

size [if $ then aj k (e, s) :-*=*> (cond[<f>]fc(e) => size[a]fc(e, s)) A ((-icond[<I>]fc (e)) =*> s = 0) 
posjif <f> then a]fc(e,i) pos ; [a] fc (e,*) 

cond[$x; =dee P $Xj]fc(e) :<=>■ 3s : size[$Xi]fe(e, s) A size[$Xj]fc(e, s) A 

Vp:l<p<s=>/\ pos ; [$Xi](e,p) t=> pos^Sx^K^p) 

axis d escend an t[a]fc(e, i, j) 3 i',j' node[a] fe (e, i, i') A node[a] fc (e, j, j') A * < j A j' <i! 

axis c hiid[a]fc(e, i, j) 3i',j' node[a] fc (e, i, i') A node[a] fc (e, j, j') A 

i < j A j' < 1 A $I, l' : node[a]fc(e, l, Z') A i < l < j A j' < l' < % . 

item[a]fc(e, i) ■■■& 3 i' : node[a] fc (e, *, *') A $j,j' : node[a] fc (e, j, j') A j < i A i' < j' 

Figure 6: FOM encoding of the Core XQuery evaluation problem. 



Let 

node[a]fc(e, i,i') :0 

V P° s < a >Hfe(e,*) Apos </a> [a] fc (e,i')A 

(o> 6E 

#m(* <u<i' A pos <a> [a] fc (e, it)) = 

#«(* < it < i Apos </a> [a] fc (e,u)) 

That is, node[a]fc(e, i, i') is true iff i and i' are the posi¬ 
tions of a opening tag and a matching closing tag. Since we 
may assume that the document is well-formed, a sufficient 
condition for i! being the closing tag matching the opening 
tag i is that the number of opening tags (a) between i and 
%' is the same as the number of closing tags (/a) (i.e., other 
tags do not have to be considered). 

Now, our XQ[=dee P \ query Q can be encoded by FOM 
formulas poS;[<2]fc and size[Q]fc as shown in Figure 0 (for 
most XQ constructs). (Because of space limitations, we will 
only be able to provide the full construction in the long 
version of this paper, but it is straightforward to supplement 
the construction for the remaining operations.) Note that 
in (pos i [a]fc(e, i), (size[a]fc(e, s), e denotes the environment 
for k variables, indicating positions/nodes assigned to known 
variables. 

Let the defining expression for a variable $ 2 , expr(%x), be 
a if $2 is introduced in an XQ expression “for $2 in a return 
0 ” or “let $2 := a”. 

To get an intuition for the reduction of Figure [lj] consider 
again our XQ semantics definition of Figure [3 There, the 
environments e are tuples of valuations of XQuery variables 
(i.e., trees). Consider the minor reformulation of the seman¬ 
tics that we get if we assume that the value of each variable 
$ 2 i in an environment is an integer that indicates the posi¬ 
tion of the starting tag of the node it binds to in the value 
of defexpr{$x). To get a correct semantics definition along 
these lines, we just have to set 

I$ 2 i]fc(tl, . . . ,t n ) ■= [expr($ 2 j)];_l(tl, . . . , ti-l) 

[(let $ 2 fc+ i := a) 0 jk{e) := \@\ k +i(e, 1) 

and an analogous definition for “for”-expressions (which sets 
the value in the environment to the start index of node v 
in [expr($ 2 i)]i_i(£i,..., ti-i) rather than the node itself). 
Figure[3]now shows a direct encoding of this altered seman¬ 
tics in FOM. 

Considering the problem of deciding whether the root 
node of the query result has a child as the decision prob¬ 
lem for query evaluation, we encode it in FOM (for query 
Q) as 3s : size[Q]i(l, s) A s > 2. □ 
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APPENDIX 

Proof of Proposition 13721 (Sketch). For simplicity, we 
will here assume that all tuples are pairs, but the proof 
immediately generalizes to tuples of higher arity. 

We assume complex values given as strings constructed 
using symbols from alphabet E consisting of (, ), {, }, 
and character symbols for atomic values. 

For example, the value 



of type 

{(A : Dom, B : Dom}} 

is represented as string “{(a, b), (c, d)}”. 

Given a complex value v, we identify (set-, pair-, and 
atomic) terms of v (i.e., nodes of the tree shown above) 
by the index of the first symbol of the term in the input 
string. For instance, the root node is identified with index 
1 because the opening curly brace is the first symbol of the 
input string. 

Let 7„ = (1,..., |i>|}. Let flat be a function that maps 
every complex value v of type r to a relational structure 
(Set, Pair , Atomic ) with relations Set C 7f, Pair C 7®, and 
Atomic C 7„ x Dom , where ( x , y) £ Set iff there is a set-term 
t v (x) in v that has a term t v (y) as member, (x , y, z ) £ Pair 
iff there is a pair-term t = (t\,t 2 ) in v with t = t v (x), 
ti = t v (y), and f 2 = t v (z), and (x,w) £ Atomic iff there is 
an atomic term t = w in v with t = t v (x). 

For the example value discussed above, 

Atomic = {(3, a), (5, b), (9, c), (11, d)} 

Set = {(1,2), (1,8)} 

Pair = {(2,3,5), (8,9, 11)} 

We have the power of TCo at hand to define a reduction 
from our input strings to the flat relations. We will not go 
into the details of a TCo reduction (cf. El), these are techni¬ 
cal but in this case easy. The only point worth mentioning 
is that we can check whether two indexes i,j are the left 
and right delimiters of a set or tuple. We show this in FO- 
order logic with majority quantifiers (FOM). By El, TCo = 
FOM. ft is also known E| that FOM can express predicates 
x = y + z and x = #y({)(y), such that x is the number of 


positions y for which 4 >(y) holds. 

set-nod e(i,j) := Q{(i)AQy(j)A 

x = #it(Q{(n) A i < u < j) A 
y = #u{Q} (m) a i < u < j) A X = y 
tuple-node(i, j) := Q((*)AQ)(j)A 

x = #m(Q{(w) A * < u < j) A 
y = #«(<2> («) A i < u < j) A x = y 

where (Q a )aeE represents the input string and Q a (i) is true 
iff symbol a is at position i of the input string. (That is, 
these formulae state that i,j are positions of symbols with 
matching opening and closing delimiters and the number of 
opening delimiters occurring between i and j is the same as 
the number of closing delimiters.) 

Atomic nodes can already be defined in FO. Let “node” 
denote nodes of any of the three kinds. Now, for instance, 

<f>Set{i,j) := 3 i',j' set-node(i,i')Anode(j,j')Ai < jAj 1 < i A 
-i3 k, k' node(fc, k') A i < k < j A j' < k' < % . 

This formula states that i is the identifier of a set-node and 
j the identifier of one of its children. 

Let the Afu[c] query V T for the type of the input data be 
defined inductively as 

Vbom := Atomic o map((l : 7ri, 2 : 772 o sng}) 
V(A: T1 ,B:t 2 ) ■= Pair o map((l : 7ri, 2 : V T1 \tt 2 x V T2 17t 3 )) 

Vs T \ := Set o (1 : map(7ri), 2 : id) o pairwitfq o 
map((l : 7 Ti,2 : 7T2|7ti o (id x V T ) o 
< 71 = 2.1 o map(7T2.2) o flatten o sng)) 

where S\v = (1 : v, 2 : S) o pairwith s o < 71 = 2.1 o map(7T2.2). 

For our example, we get V T as shown in Figure [7] 

It is not difficult to verify that for every complex value 
v of type r, V r (flat(v)) = {(1 : i, 2 : {«})}, where i is the 
identifier of the root of v, and that V' := V T 0 map(7T2) o 
flatten computes {«}. 

By Theorem for every Atu[cr] query from flat rela¬ 
tions to flat relations there is an equivalent relational alge¬ 
bra query. Thus, for any Boolean .A/fu [cr] query Q , there is 
a relational algebra query Q' = V' o map(Q) o flatten. Of 
course, 

Q(v ) (V' o map(Q) o flatten) [flat(v)) Q'(flat(v)). 

For a fixed query Q (and thus a fixed type r), Q' is fixed 
and can be evaluated on a (flat relational) database in ACo 
(cf. e.g. EH) and thus in TCo. Preprocessing function 
flat is in TCo, so we can compose these two steps and get a 
TCo overall bound. □ 



Vbom 
V<A :Dom,B:Dom) 


V{(A:Dom,S:Dom)} 


{<3, {«}), <5, {&}, <9, {c}), <11, {d}» 

{{2, 3, 5), (8, 9, 11}} o map((l : 7ri, 2 : Vbom|7r 2 x VDom|7r 3 )) 

{{2, VDom 13 X Vbom|5), (8, VDom 19 X Vbom|ll}} 

{(2, {«} x {&}}, (8, {c} x {d})} 

{(2,{(a,b)}),(8,{(c,d)})} 

{{1, 2), (1, 8)} o (1 : map(7ri), 2 : id) o pairwith 1 o 

map((l : 7ri, 2 : 7r 2 |7ri o (id x l(A:Dom,B:Dom)) ° oa =2 .i o map(7r 2 . 2 ) o flatten o sng}) 

(1 : {1}, 2 : {(1, 2), (1, 8}}) o pairwith 1 o 

map((l : 7T1, 2 : 7r 2 |7ri o (id x V(A:Dom,B:Dom)) ° o' 1=2.1 ° map(7T 2 . 2 ) o flatten o sng}) 
{<1 : 1,2 : {<1,2), <1,8)})> o 

map((l : 7T1, 2 : 7r 2 |7ri o (id x V(A:Dom,B:Dom)) ° o' 1=2.1 O map(7T 2 . 2 ) o flatten o sng}) 
{(1 : 1, 2 : {2, 8} o (id x V^Dom.BrDom)) o oi— 2 .i o map(7T 2 . 2 ) o flatten o sng)} 

{(1 : 1, 2 : ({2, 8} x {(2, {(a, 6)}), (8, {(c,d)})}) o <ti =2 .i o map(7r 2 . 2 ) o flatten o sng)} 
{(1:1,2: {(2, (2, {(a, &}}}}, (8, (8, {(c, d)}}}} o map(7r 2 . 2 ) o flatten o sng}} 

{(1 : 1,2 : {{(a, 6)}, {(c,d)}} o flatten o sng}} 

{<1 : 1, 2 : {{(a, b), <c, d}}})} 


Figure 7: V T for the running example. 





